Programmable network elements are the building blocks of the emerging 'Global Environment for Networking Innovations (GENI) initiative – targeted at developing the next generation of networking and distributed system architectures. The hardware platforms for these programmable network elements must be versatile; they must facilitate rapid development of a wide-range of high-throughput packet processing applications. Programmable network processors (NPs) – and more generally, the emerging class of multicore multithreaded processors – provide the foundation for designing such network elements. Unfortunately, platforms based on today's NPs are not versatile; achieving high packet throughput on these platforms often requires enormous programming effort. Today, much of this effort is spent in addressing the memory bottleneck. In particular, programmers struggle (1) to utilize effectively several low-level mechanism (such as hardware multithreading, exposed memory hierarchies, asynchronous memory accesses, etc.) supported by NPs, and (2) to bridge the gap between the fixed configurations of low-level mechanisms supported by NPs and the mechanism configurations required for specific deployments.
In this talk, I will present our NP architecture that achieves versatility through malleability. I will first demonstrate that, to achieve versatility, NP architectures (and multi-core multi-threaded architectures in general) must allow chip resources to be traded off dynamically between the two main mechanisms to mitigating the memory access overhead: (1) data caching (which reduces the average latency by exploiting locality), and (2) hardware multithreading (which hides the latency by exploiting packet-level parallelism). I will then present a novel architecture that achieves this malleability. Finally, I will demonstrate that our malleable processor, designed with the same chip area as Intel's IXP2800 (a state-of-the-art NP), can improve, across all the deployments we consider, throughput by an average of 98% as compared to IXP2800. Further, in about 1/3rd of the deployments, throughput improvement is as large as 300%. I will also argue that our malleable NP architecture is substantially easier to program than today's commercial network processors.